# cluster analysis

A form of multivariate analysis , of which the purpose is to divide a set of objects (such as variables or individuals), characterized by a number of attributes, into a set of clusters or classes, in such a way that the objects in a class are maximally similar to each other and maximally different to the other objects, with reference to a selected list of descriptive indicators and characteristics which form the basis of the analysis. In biology the technique is known as numerical taxonomy.
Cluster analysis was among the multivariate statistical techniques developed by (Social Area Analysis, 1955) for analysing census data. It is applied to census small-area statistics and social indicators in social area analysis to create area typologies, either focusing on particular urban or metropolitan areas, or covering the country as a whole. Cluster analysis found a wide range of applications in other areas, including developmental work with opinion statements or questions from which an attitude scale will be formed; exploratory work to identify underlying patterns in large data-sets; analytical work to measure significant similarities and differences between individuals, social groups, companies, or other types of organization, nation-states, types of event, and so forth; and the development of classifications and typologies.
Different ways of defining similarity and difference give rise to distinct methods of clustering. Alternative ways of determining how well the solution fits the data will generally give rise to somewhat disparate results. Most classification procedures begin with a table of association of dis/similarity coefficients between each pair of objects and then proceed in one of two ways-bottom up (where the objects are successively merged into larger clusters) or top down (where the entire set of objects is divided into increasingly small clusters). These yield as a solution a hierarchical clustering scheme (HCS), which is represented by a dendogram, or tree. An HCS is also often represented as a set of contours within a multi-dimensional scaling solution of the same data. The most common clustering method is stepwise hierarchical clustering with output displayed in a dendogram figure, which clearly identifies any outlier cases that remain separate from other cases until the final stage of the clustering process when all cases are combined in a single group, with three or more intermediate levels of aggregation.
Recent developments in this field include additive overlapping clustering (where each cluster has a measure of its importance), additive trees (where the length of the path between points represents the data dissimilarity), and rectangular clustering (where both the individuals and the variables of the data are clustered jointly).

Dictionary of sociology. 2013.

